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ABSTRACT 

"''v 

Assumptions  which  could  motivate  various  L-estimators  and  M-estimators 
are  discussed.  In  particular,  for  samples  of  size  ter.  a  distribution  in  the 
contaminated  exponential  power  family  is  found  whose  posterior  mean  approximates 
each  of  a  number  of  proposed  L-estimators.  Also  distributions  in  this  family 
are  found  whose  posterior  modes  approximate  suggested  M-estimators. 
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SIGNIFICANCE  AND  EXPLANATION 


In  recent  years  attempts  have  been  made  to  obtain  "robust  estimators"; 
that  is,  functions  of  the  data  which  produce  estimates  which  are  less  sensi¬ 
tive  to  non-normality  of  the  error  distribution  and  to  occasional  "bad"  ob¬ 
servations.  Some  of  the  methods  proposed  are  highly  empirical.  In  this  paper 
an  attempt  is  made  to  link  the  variously  proposed  robust  estimators  to  partic¬ 
ular  models.  It  is  then  possible  to  say  of  a  given  estimator  that  it  would 
be  applicable  if  some  specified  model  were  believed  roughly  to  represent 
the  process  of  data  generation. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  authors  of  this  report. 


IMPLIED  ASSUMPTIONS  FOR  SOME  PROPOSED 
ROBUST  ESTIMATORS 

Gina  Chen  and  George  E.  P.  Box 

1 ■  Introduction 

Statistical  inferences  should  depend  or  the  parent  distribu¬ 
tion  from  which  the  data  are  assumed  to  come.  The  sample  mean,  for 
example,  is  a  good  estimator  for  the  location  parameter  of  the  normal 
distribution,  but  is  not  necessarily  good  for  the  double  exponential 
or  rectangular  distribution.  Therefore,  in  doing  data  analysis,  one 
always  faces  the  difficulty  of  making  appropriate  inferences  because 
the  parent  distribution  of  the  data  is  never  known. 

The  Normal  distribution  was  introduced  by  Gauss  as  follows: 
(Gauss  (1821),  also  Huber  (1972)) 

"The  author  of  the  present  treatise,  who  in  the  year  1797 
first  investigated  this  problem  according  to  the  principles  of  the 
theory  of  probability,  soon  realized  that  it  was  impossible  to  deter¬ 
mine  the  most  probable  value  of  the  unknown  quantity,  unless  the 
function  representing  the  errors  is  known.  But  since  it  is  not, 
there  is  no  other  recourse  than  to  assume  such  a  function  in  a 
hypothetical  fashion.  It  seemed  most  natural  to  him  to  take  the 
opposite  approach  and  to  look  for  that  function  which  must  be  taken 
as  a  base  in  order  that  for  the  simplest  of  all  cases  a  rule  is 
obtained  which  is  generally  accepted  as  a  good  one,  namely,  that 
the  arithmetic  mean  of  several  observations  of  equal  accuracy  for 
one  and  the  ‘■'me  quantity  should  be  considered  the  most  accurate 
value.  Thv..  implied  that  the  probability  of  an  error  x  must  bo 
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assurr,cd  proportional  to  an  exponential  expression  of  the  fonn 
•hhxx 

e  ,  and  that  then  just  the  same  method  wliich  he  had  found  by 
Other  considerations  already  a  fev;  years  earlier,  v/ould  become  nec¬ 
essary  in  general." 

Note  that  here  Gauss  introduced  the  normal  distribution  to 
suit  the  sample  mean  —  ah  estimator  which  was  thought  to  be  good. 
Over  the  years,  many  other  estimators  for  location  have  been  proposed 
and  claimed  to  be  good  on  empirical  basis.  For  each  such  estimator, 
one  could  ask  the  question  "Which  distribution  is  this  estimator 
suitable  for?"  or  alternatively  "l.'hat  type  of  distribution  does 
the  advocate  of  the  estimator  really  have  in  mind?"  We  discuss  this 
problem  in  this  chapter. 

We  shall  consider  a  class  of  distributions  capable  of  repre¬ 
senting  a  wide  range  of  behavior.  For  each  of  a  number  of  proposed 
estimators,  we  will  then  try  to  find  a  distribution  in  this  class  for 
which  the  estimator  is  appropriate.  We  will  now  be  more  specific. 

2-  Some  Robust  Estimators  for  Location 

Robust  estimators  are  estimators  which  are  believed  to  be 
good  for  a  broad  class  of  distributions  but  not  necessarily  best 
for  any  one  of  them.  There  are  different  methods  for  constructing 
a  robust  estimator. 

Suppose  X ^  »X2* •  •  •  jj  anc  observations  drawn  from  some 
unknown  distribution,  and  ^(i ) ’  •  •  • 


are  the  ordered 


observations.  The  following  are  sone  of  the  robust  estimators  which 
have  been  proposed  for  the  location  parameter  of  a  single  sample. 

( 1 )  "Maximum  likelihood  type"  estimators  (M-estiniators) 

If  the  distribution  is  known  to  be  r(-^)  with  density 

f(~— )i  then  the  maximum  likelihood  estimate  of  0  is  given  by  6 

which  maximizes  the  following  equation: 


N 

n  f( 
i=l 


) 


or  equivalently  maximizes 


log  f(^) 


0  win  then  satisfy  the  following  equation 

N  x,-e 


X.--9 

N  f  (  _  )  n  x.-D  ^1 

I* — r—g-  =  0  or  J  ®  'I'  =  -  -f-  • 

jZi  x.-y  -ti  o  T 


An  M-estimator  (6)  was  defined  by  Huber  to  be  the  solution  of  an  equation 


N  X.-0 

I  )  =  0»  where  i()  is  some  specifically  chosen  function. 


i=l 


In  general,  0  is  solved  by  iteration,  and  s  is  some  estimate  of  the 
scale  parameter.  Some  examples  of  the  ij;  function  chosen  for  M 
estimators  are: 
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(1) 


(1)  ^{x)  =  X 

This  is  the  special  case  that 
gives  as  estimator  the  sample 
mean . 


(2)  Huber's  (Huber  1954  ,  Andrev/s  (2) 
et  al.  1972)' 


f-k  .  X  <  -k 

<>(x,k)  =  X  ,  -k  <  X  <  k 
k  ,  k  <  X 


with  k  = 1 .5  or  2.0. 


(3)  Andrews'  (Andrews  et  al.l972)  (3) 

!sin{x/c),  [xl  <  CTT 

0  ■  o.w. 

with  c  =  2'.]  X  .6754 
«  1.42. 


[4) 


:5) 


Hampel's  (c.f.  Andrev/s  et  al.l972) 
i|i(x,a,b,c) 


(4) 


0< 

a<l 


<a 

<b 


sgn  X 


^^^,b<lxl<c 


Tukey's  biweight  (Beaton  & 
Tukey  (1974)) 

-x[t-(|)']’|x|ic 


<»(x,c) 
with 


■r 


0  o.w. 

C  =4.05,  5.40  or  6.10. 


-c 


6.42 

3.98 

5.54 


(5) 


The  values  given  for  the  constants  k,  a,  b,  c,  etc.  are  those  most 
commonly  used.  In  Hampel's,  Andrev/s'  and  Tukcy's  M  estimators,  s 
is  usually  estimated  by  median  of  the  absolute  deviations  from  the 
median.  Such  a  robust  scale  estimate  has  expection  .6754a  under 
normality  and  Huber  standardizes  by  dividing  by  this  constant.  For 
comparison  purpose,  we  have  adjusted  the  values  of  a,  b,  and  c  such 
that  in  all  cases  the  scale  estimate  used  has  expected  value  o  on 
the  assumption  of  normality. 


(ii)  Linear  combinations  of  order  statistics  (L-estimators) 


*'1  .. 
These  estimators  have  the  form  V  a.X,.v,  where  X/.x  is  the  i^^' 

•{  =  '1  ^  n  /  V  W 

order  statistics. 

r  is  defined  as  Na  and  0  ^  a  for  the  follov/ing  estimators. 
(1)  Trimmed  Mean  (Crow  &  Siddiqui  1967) 


=  (N-2r)‘^  I  X..X  . 
”  i=r+l 


If  r  is  not  an  integer,  the  following  form  is  taken 
(Andrews  et  al,1972) 


(2)  Winsorized  Mean  (c.f.  Crow  &  Siddiqui,  1967;  Tukey,  1962) 
When  r  is  an  integer, 
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if  N  is  odd  and  a  =  »  then  W|,j(a)  is  the  median. 

(3)  Linearly  V/eighted  Mean  (Crov/  &  Siddiqui  1967) 

(Smoothly  Trimmed  Mean  I  (Stigler  1973)) 

When  r  is  an  integer,  N  =  2n 

4(a)  =  -^(n-r)  ^[X^j^^j+X^^j_^j+3(X^j^2)'*'’^(M-r-l)'^--- 

+(2i-2r-l)(X^^j+X^f^_^^^P+...+(2n-r-l){X^j^j+X^j^^^j)] 

N  =  2n+l 

4(“)  =  [(n-r)2+(n-r+l)^]  )+X^I^_^j+3(X^^^2)'^’^(N-r+l  )^ 

+. .  .+(2i-2r-l  )(X^ .  j+X^^_  j)+. .  .+(2n-2r-l ) 

In  particular,  when  a  =  0,  r  =  0 

ntn+l )  Tjf’**(i)*’'(2n+l-i)'^  N  =  2n  . 

. (Crow  1964) 

(4)  Smoothly  Trirrmod  Mean  II  (Stigler  1973) 

N 

L.,(a)  =  r  W.X,  with 

f'  i=l  1  (i) 


( i i i )  Estirnator  derived  frorn  rank  tests  (R-ostimators) 
Consider  a  2-samp1e  rank  test  for  shift.  and 

2^, ...,2,,^  are  two  independent  samples  with  distribution  F(x)  and 

F(x-A)  respectively.  To  test  A  =  0  against  A  >  0,  the  following 
test  statistic  can  be  used 


where  =  1  if  the  i^^  smallest  entry  in  the  combined  sample  is 

a  y,  and  V.  =  0  otherwise.  And  J  is  some  function  defined  on 

[0,1]  such  that  J(l-t)  =  -J(t). 

An  estimate  for  the  location  parameter  can  be  derived  from 

such  tests.  Determine  estimate  T  (x,  ,x.,, . . .  ,x  )  such  that 

n  I  2  n 

W(xi-Tn,...,X|^-Tn;-(Xi-T^),-(x2-T^),...,-(Xj^-Tn))  =  0  . 

(iv)  Adaptive  estimators  (c.f.  Hogg  1974) 

This  is  a  class  of  estimates  which  will  adapt  to  the  data. 

For  example,  selecting  the  trimming  proportion  of  a  trimmed  mean 
after  the  sample  is  drawn. 

We  shall  consider  only  L-estimators  and  M-estimators  in  this 
study,  but  as  v/e  have  seen  even  in  these  classes  there  are  a  confus¬ 
ingly  large  number  of  candidates.  One  way  of  better  understanding 
.their  relative  merits  is  as  follows. 
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F 


The  originators  of  these  estimates  seem  to  believe  that  the 
parent  distributions  of  the  real  world  are  likely  to  be  non-normal 
and  in  particular  to  be  heavy  tailed.  It  would  bo  valuable  if  these 
implied  beliefs  could  be  brought  out  into  the  open,  examined  and 
perhaps  compared  with  practical  reality.  In  order  to  do  this  we 
need  first  to  be  able  to  parameterize  non-normality. 

3.  Distributions 

In  past  studies  of  robust  estimators,  for  example  Crew  (IQS'!), 
Crow  and  Siddiqui  (1967),  Gastwirth  and  Cohen  (1970),  Andrets  et  al . 
(1972),  distributions  employed  have  varied  from  light-tailed  distri¬ 
butions  such  as  the  rectangular  to  heavy-tailed  distributions  such 
as  the  double  exponential,  Cauchy  and  contaminated  normal  distri¬ 
butions.  A  convenient  class  which  includes  both  light  and  heavy 
tailed  distributions  which  we  will  employ  in  our  study  is  the 
exponential  power  family  (c.f.  Box  and  Tiao,  1973),  the  density 
functions  for  which  are  given  by: 


P(y|0,a,B) 


w(B)o  ^exp 


-c(B) 


yzi 

O  I 


2/(l+0)l 


-00  <  y  <  oo 


where 


c(6) 


rrn  }  - - = — ^ — - 

r[2(1-'-3)]J  (H3){r[-2-(U6)]}^/^ 


O>0  ,  oo<0<co 


-1  <  B  <  1  . 
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In  the  above  expression,  G  is  a  location  parameter  and  a  is  a 
scale  parameter.  The  parameter  0  can  be  regarded  as  a  measure  of 
kurtosis  indicating  the  extent  of  'non-normality'  of  the  distribu¬ 
tion.  As  B  goes  from  -1  to  1,  the  distribution  goes  from  platy- 
kurtic  to  leptokurtic.  In  the  limiting  case  B  -1 ,  the  distri¬ 
bution  tends  to  the  rectangular,  for  0=0  it  is  normal  and  for 
0=1  it  is  double  exponential  distribution. 

V.'hat  do  estimators  imply  about  P(B) 

Suppose  it  could  be  assumed  that  the  parent  distribution  v/as 
a  member  of  the  exponential  pov/er  family  and  that  a  prior  distri¬ 
bution  could  be  written  down  for  P(B)  which  represented  the  prob¬ 
ability  of  occurrence  of  different  values  of  0  in  the  particular 
experimental  situation  in  the  study.  Then  after  putting  a  non- 
informative  prior  on  0  and  a  and  following  the  Bayesian  pro¬ 
cedure  and  integrating  0  and  a  out,  the  posterior  distribution 
of  6  would  be  obtained.  A  point  estimate  of  0  v/ould  then  be 
given  by  the  posterior  mean  v/hich  minimizes  squared  error  loss. 
Minimization  of  other  loss  functions  could  be  achieved  by  using 
other  features  of  the  posterior  distribution,  but  we  shall  suppose 
throughout  this  study  that  the  squared  error  loss  function  is 
appropriate. 

Now  if  for  a  particular  ad  hoc  estimator  of  0,  we  could 
find  a  prior  distribution  P(0)  such  that  the  resulting  posterior 
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mean  closely  approximates  this  estimator,  then  v/e  could  say  that  in 
using  this  estimator  the  statistician  behaved  as  if  he  believed  that 
P(3)  represented  this  distribution  of  distributions  in  the  real 
v/orld.  This  P(B)  could  then  be  considered  for  its  reasonableness 
and  possibly  for  its  concordance  vn'th  distributions  that  actually 
occurred  in  the  real  world.  In  this  case  we  shall  say  that  we  have 
found  "A  Bayesian  formulation  associated  with  the  estimator." 

Contaminated  Exponential  Power  Distribution 

Greater  flexibility  in  the  assumed  form  of  distribution 
could  be  obtained  by  allowing  for  contamination  in  the  following 
way. 

Consider  a  distribution  with  density 

"  (l-a)P(y|e,a,B)  +  aP(yl0,ka,3)  . 

That  is,  with  probability  (1-ci)  the  observation  comes  from  an 
exponential  power  distribution  with  kurtosis  parameter  3,  mean 
0  and  variance  o^  and  with  probability  a  the  observation  comes 
from  an  exponential  power  distribution  with  the  same  parameter  3, 
mean  0  but  a  much  larger  scale  parameter  ko. 

Posterior  Distribution  of  0 

Consider  the  distribution  P  {yl0,G,3,a,k)  with  prior 

V 

distribution  P{3)  on  3.  Assume  that  3  is  distributed 


independently  of  6  and  a,  so  that  the  prior  distribution  for 
6,  o,  3  is  P(0,a,3)  =  P(6)P(0,a).  Adopt  as  noninformative  prior 
for  (0,a);  P(0,a)  «  Then  the  joint  posterior  distribution 

of  (6,0,3)  for  a  chosen  a  and  k  is 

o'V{3)TiP^(y.  le,o,3,a.k) 

P(e.a.3lY,a,k)  =  — ^ - ; -  . 

jjja  P(3)"P^{y^-  |e,a,3,a,k)d9dod3 


Also  the  marginal  posterior  distribution  of  0  given  a  and  k  is 


o"V(3)'iTP^(y^  10,0,3, a, k) 


p.  u  r\p;iir^vy^  lD,o 

P(e|Y,a,k)  =  If - -j - 5 - 

Jj  jlJa  P(3)TTP^(y^|6,o,3,a, 


k)d9dad3 


^  r  P(0,3,Yla,k)  ,,  _  f  P(3,Yla,k) 
J  P(Yia-!k)  -  J  -nvliTR 

» /  P(6|Y,a,k)P(6|Y,e,a,k)dS  . 


Thus  the  posterior  mean  of  0  with  a  given  prior  P(3)  is  the 
weighted  average  of  the  posterior  mean  for  each  given  3  value 
v;ith  the  weighting  function  P(3|Y,a,k).  Also  if  the  sample  size 
Is  not  large  there  will  be  little  information  about  3  coming  from 
the  sample  and  P(3lY,a,k)  can  be  approximated  by  P(3).  The 
results  are  not  particularly  sensitive  to  k  (Box  and  Tiao  (1968)) 
and  we  set  it  equal  to  a  constant  3  for  this  part  of  the  study. 
For  the  time  being  therefore  the  parameter  k  is  treated  as  a 
known  constant  and  dropped  from  the  formulas. 
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Ordered  Observations 


To  compare  the  estimates  given  by  linear  combinations  of 
order  statistics  v;ith  the  posterior  mean  derived  from  a  particular 
parent  distribution,  we  need  to  first  approximate  the  posterior 
mean  with  a  weighted  average  of  the  ordered  observations.  Suppose 
the  parent  distribution  is  F{yl9,a),  then  given  n  ordered  obser¬ 
vations  Y  =  (y^i ) » •  •  •  »y(n)) >  calculate  the  posterior  mean 

My.  If,  for  any  sample  Y  from  F(y|e,a),  My  is  approximately 


equal  to 
F  is  "a 


1  w.y/.'i  for  some  fixed  set  of  w.,  then  we  can  say 

i.l  1  (1)  1  ^ 

distribution  associated  with  the  estimator  I  w.y/.x." 

1=1  ^ 


In  particular,  the  posterior  mean  for  a  normal  distribution  is 


always  equal  to  the  sample  mean  =  we  can  say  that 

a  distribution  associated  with  the  sample  mean  is  the  normal  dis¬ 
tribution. 

To  obtain  an  approximation  more  generally  one  can  proceed 
as  follov/s: 

Given  a  distribution,  take  a  random  sample  of  size  n  from 
the  distribution,  and  denote  the  ordered  sample  by  Y^  =  (y-j^j, 

**’*^l(n)^’  ^^Iculate  the  posterior  mean  My  .  Take  a  second 

random  sample  and  have  them  ordered  and  denoted  by  Y2  =  ^^2(1)* 
y2(n)^’  calculate  the  posterior  mean  My  .  Repeat  the 
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procedure  ni  tinies;  v/e  get  m  sets  of  ordered  sample  Y^,Y2,..., 

Yjjj  and  their  corresponding  posterior  means  My  ,  My  . My  .  The 

'l  '2  m 

relationship  v.'e’d  like  to  establish  is  M  W]y(]  •■’^n^(n) 

where  M  is  the  posterior  mean  of  the  sample  ) » • • • • 

More  specifically,  we  are  trying  to  find  a  set  of  weights  (w^ , 

•  such  that  for  any  sample  from  F(y|0,a)  the  posterior 


mean  is  approximately  the  weighted  average  of  the  ordered  observa¬ 


tions  with  this  set  of  weights.  Regressing  My  = 


Y  ”  j’  ^  estimated  weights  (w^,...,w^). 


Obviously,  these  estimated  weights  (w^,...,v/^)  depend  on  the 


m  sets  of  random  samples  chosen,  the  w^'s  would  be  different 


if  different  samples  had  been  taken.  However,  it  is  shown  in 
Appendix  2. A  that  there  is  a  set  of  limiting  estimates  of  the 
weights  as  m  -»•  »  and  that  these  limiting  weights  are  the  same 
as  those  weights  that  give  rise  ^0  the  Ordered  Uncar  Unbiased 
Minimum  V^ariance  Estimator  (OLUMVE).  These  weights  can  thus  be 
calculated  without  going  through  the  above  sampling  procedures. 
(The  method  of  calculation  is  given  later.)  And  from  now  on 
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v/c  should  keep  in  mind  ttiat  the  posterior  mean  is  approximately  the 
OLUMV  estimator  for  samples  from  F(yl0,o). 

So  for  a  contaminated  exponential  pov/er  distribution  with 
fixed  3, a,  the  posterior  mean  ^  is  approximately 

v/eights  for  the  OLUMV  estimator.  Furthermore,  if  a  prior  P(3) 
v/as  put  on  3,  then  the  posterior  mean  can  be  vjritten  as 

M  =  /0P(elY,a)d3 

=  /e(/P(6lY.B.a)P(3lY,a)d3)d9 
=  /P(3lY,a)(/6P(9|Y,3,a)d8)dS 

=  (/P(3lY,a)w^g^d3)y(^j+...+(/P(3lY,a)w^^d3)y(^j  . 

When  the  sample  size  is  not  large,  the  posterior  distribution 
P(3lY,a)  will  be  dominated  by  the  prior  P{3),  and  P(3lY,cx)  *sP(3). 
In  this  case,  the  posterior  mean  is  approximately  w^yj^j+,.. 

^V(n)  '“^i  ~  ^''2  then  conclude  that 

'^P’(l ’ ‘^'V’(n)  ^  estimator  for  the  contaminated  expon¬ 

ential  power  distribution  with  prior  P(8)  on  3  and  parameter  a. 
Conversely,  if  wo  are  given  an  L-estimator  *  it 

is  also  possible  to  find  a  prior  distribution  P{3)  and  a  value  a 
such  'I'.ct  V/.  '^/P{3)w.p  d3.  This  P(6)  and  a  will  then  lead 

1  1  pCi 
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Such  a  prior  is 


to  a  posterior  mean  close  to  j+. . 

of  course  not  unique.  For  simplicity  therefore  v;e  proceed  by 
supposing  that  P(B)  belonged  to  the  family  of  beta  function  priors 


p{e) 


r(p+q+?) 

Fr(p+i)r(q+i) 


-1  <  B  <  1. 


This  family  contains  tv;o  adjustable  parameters  p  and  q.  At  this 
point  in  the  investigation  therefore  the  class  of  non-normal  distri¬ 
butions  is  defined  by  three  parameters  which  could  be  conveniently 
thought  of  as  a  and  the  mean  and  variance  of  P(B).  (Recall  k 
was  fixed  at  3).  A  comprehensive  numerical  investigation  however 
showed  that  the  crucial  factor  was  where  the  prior  v/as  centered,  the 
variance  of  the  prior  distribution  did  not  have  much  influence  on  the 
weights.  In  what  follows,  therefore,  we  shall  employ  a  point  prior 
for  the  distribution  for  $. 


Simplified  Problem 

Fixing  the  variance  parameter  for  P(B)  at  zero,  we  were 
left  with  a  point  prior  at  (6,a).  It  was  thus  possible  to  find  a 
single  contaminated  exponential  power  distribution  for  which  the 
Bayesian  posterior  mean  was  almost  the  same  as  the  L-estimator.  In 
many  cases  where  the  likelihood  function  is  almost  symmetric,  the 
posterior  mean  is  essentially  the  same  as  the  maximum  likelihood 
estimate.  We  subsequently  concentrated  therefore  on  finding  a  con¬ 
taminated  exponential  power  distribution  associated  with  each  L- 
estimator. 
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The  Contaminated  Exponential  Pov/er  Distribution  Associated 
with  tho  L-estinator 

The  problem  nov/  reduces  to  the  follov/ing: 

Given  an  L-estimator  ^  j+. . ,  find  a 

and  3  such  that  T  is  approximately  OLUMV  estimator  for  the 
contaminated  exponential  pov/er  distribution  with  parameters  a  and 

3. 


Method  of  Computing  the  Heights  for  OLUMVE 


Suppose  Y  =*  (y^^j 


)’ 


is  an  ordered  sample  from 


distribution  F(^^).  Let  —  ,  then  U  = 

'**'*(n))*  ordered  sample  from  F(y).  Set 


1  = 


a  =  EU  * 


nxl 


'  nxl 


V  =  Var(U)  =  (v,.) 


nxn 


where 


v,j  VCov(u„),U(j,) 


A  »  (l.a)  . 


Then  Y  =  01  +  oa  +  o(U-a) 

»»  •>* 

•  A(®)  +  e  E(e)  =  0  V(e)  =  a*V  . 

By  the  Gauss-Markov  Theorem,  the  best  linear  unbiased  estimator 
(minimum  variance)  is  given  by 
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(A'V'^A)"^A'V'^Y 


^  tv‘'y 

in  particular,  when  F  is  symmetric,  6  =  - 5—  . 

TV"'l 

I'/hen  the  sample  size  is  fixed,  a  numerical  method  for  calculating 
the  variance  and  covariance  matrix  (V)  was  given  by  Lund  (1967). 
When  V  is  known,  the  weights  for  the  OLUMV  estimator  is  then  given 

liy-l 

by  — T—  .  Thus  we  are  able  to  find  the  weights  of  OLUMV  estima- 

TV‘l 

tors  for  contaminated  exponential  power  distributions  with  parameters 
6  and  a.  The  value  of  6  lies  between  -1  and  +1,  and  the  value 
of  a  is  between  0  and  1.  However,  a  is  the  probability  of  an 
observation  coming  from  some  distribution  with  large  variance,  and 
a  is  expected  to  be  small.  The  weights  which  produce  OLUMV  estima¬ 
tor  when  the  sample  size  is  10  are  actually  calculated  for  all  pairs 
of  (6,a)  with  a  =  0,  .01,  .025,  .05,  .075,  .1  and  6  =  -.99, 

-.75,  -.5,  -.25,  0,  .25,  .5,  .75,  1.0.  A  plot  of  the  first  five 
weights  . . . ,W5(^p  (w.^^  =  for  6  <  i  <  10) 

and  their  values  are  shown  in  Figure  1.  From  the  figure,  it  is 
clear  that  the  weights  change  smoothly  as  (B,cx)  change,  it  is 
then  appropriate  to  use  polynomial  interpolation  to  get  the  weights 
''ioB  with  0  £a  <  .1  and  -.99  <  3  <  1.0. 
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c 


• 

to 

It 

o 

• 

o 

6  =  .25 

[)5 

I 

.10  .10 
.10  .10  .10 

1  1  1  1  1 

.13-^^ 

..  JD9 

^fri  1 1 

0  =  .75 

3  «  1.0 

21 

L 

.26 

.15 

An  .06 

.26 

.15 

.04 

JDO*^^  ,  1 

Firsi.  five 
undorlyir,;: 
power  c'ist.i 

v.’oirjhts  for  0LU;'V  cstin.itors  \/hcn  the 
fiietrib'jtioii  is  contaminated  oxnonential 
■■i!;jtion  v;ith  specified  a  and  6 

Given  an  L-estinator  T  =  contaminated 

exponential  pov/er  distribution  associated  with  T  (if  there  is  one) 
would  be  the  one  with  parameters  (B.a)  such  that 

T“*W0f---*W(n) 

or  equivalently 

“  Vb 

n  naB 

To  find  (B»ci),  we  used  nonlinear  regression  with  w^'s  as  the 

dependent  variables,  the  coefficients  of  the  interpolated  polyno¬ 
mial  (for  predicting  the  weights  for  fixed  a,B)  as  independent 
variables  and  a  ,  B  as  the  regression  coefficients  to  be  deter- 
mined.  The  least  square  estimates  (B,a)  then  specify  the 
distribution  associated  with  T  =  '^•)y(i)‘’’***'*^^'ny(n)‘ 

5.  The  Results 

The  follovnng  robust  L-estimators  were  included  in  this 
study  with  n  =  10: 

(1)  Trimmed  mean:  a  =  1%,  2%,  2.5%,  5%,  10% 

(2)  Linearly  weighted  mean;  a  =  0,  10% 

(3)  S.Tioothly  trimmed  mean  II;  a  =  20% 
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(4)  Squared  v.'cightcd  mean 

(5)  Cubic  v;eighted  mean 

(6)  Viinsorized  mean:  a  =  lOX. 

For  each  of  the  above  L-cstimators ,  following  the  procedure 
in  the  previous  section,  a  distribution  (specified  by  (P,u))  is 
found  in  the  family  of  the  contaminated  exponential  power  distri¬ 
butions  associated  with  this  estimator.  The  parameter  k  has  been 
fixed  at  three  throughout  the  study.  Figure  2  shows  the  corres¬ 
pondence  between  the  L-estimators  and  their  associated  distribu¬ 
tions.  Each  point  in  the  B-a  space  has  a  coordinate  (B,a)  and 
this  in  turn  represents  a  distribution  in  the  family  of  contaminated 
exponential  power  distributions.  For  example,  the  squared  weighted 
mean  takes  coordinate  (.558,  .006)  and  it  is  associated  with 
.994P(yl6,o,.558)  +  .006P(y|9,3a, .558). 

Table  1  shov/s  a  comparison  between  the  weights  of  the 
robust  estimators  and  the  weights  produced  by  the  distributions 
associated  with  the  estimators.  In  general,  the  two  sets  of 
weights,  to  our  satisfaction,  are  very  close. 

Examination  of  Figure  2  shows  that  all  the  trinried  means 
(n  -  lor.)  and  also  smoothly  trimmed  mean  have  relatively  small 
values  of  B.  Further  insight  is  gained  by  plotting  the  approxi- 

^  A, 

mate  confidence  contours  for  the  least  square  estimate  (B.o) 
corresponding  to  each  L-estimator  (Figure  3).  It  is  seen  that 
for  the  trimmed  means,  smootlily  trinniod  mean  and  linearly  trimjned 
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L-estimator 

t/eights  for 

L-estimators 

Value  of  8, a 
for  OLU'IV 
estimators 

Weights  for 

(B,a)-estimator 

1)  1%  trimmed 
mean 

.102  .102 
.091  302  .102 

-1  1  1  M 

B  =  .011 
a  =  .00080 

.103  .102 
jD91  .101  .1,03 

1  1  1  1  1 

2)  1%  trimmed 
mean 

J 

.104  .104 
.083  .104  .104 

J-LI  1  1 

B  =  .020 
a  =  .00162 

.105  .104 
.084  .101  .106 

J  J  1  1  1 

3)  2.5%  trimmed 
mean 

.105  .105 
.078  .105  J05 

1  1  1  1  1 

B  =  .026 
a  =  .00217 

.107  .105 
.079  .102  .107 

1  i  1  1  1 

4)  5%  trimmed 
mean 

.111  .111. 

.056  .111  .111 

-lI  1  1  1 

B  =  .050 
a  =  .00552 

.114  .109 
.057  .104  .114 

1  1  1  1  1 

5)  10%  trimmed 
mean 

.126  .126 
.000  ,  .126,  .126 

1 T 1  1 

B  =  .040 
o  =  .03820 

.123  .123 
.002  .127  .125 

■  1  1  1  1 

Table  1  Left  side:  first  five  weights  of  the 
L-estinator 

Right  side:  first  five  weights  of  OLUfW 

estimator  for  the  contaminated  exponential 
power  distribution  (with  parameters  ($,a) 
associated  with  the  estimator. 
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L-estimator 


Heights  for 
L-estimators 


Value  of  B,a 
for  OLUMV 
estimators 


VJeights  for 
(Bta)-estir,3tor 


6)  linearly  weigh--  ^^33  ,^3 

ted  mean  J  J  J 


.167 


.211 


.156 


J04^ 

7)  10%  trimmed  .000  I 

_ 1..  _ I _ L 


,100 


linearly 
weighted  mean 


.143 

.143  .143 


8)  20%  smoothly 
trimmed  mean 


.227 

.145 


9)  squared  weigh- 
ted  mean  jqq^ 


j082 


.278 

.142 

joeo 

10)  cubic  weigh-  1 

ted  mean  ,  | 


6  =  .298 
a  =  .00399 


.169 


.132 

J092 

.072  ,  I 

i  I  I  I 


155 


.211 


JD99 

I 

B  *  .382 

J045  , 

a  =  .06865 

-J012  .  1 

.145 

.147  .146 


8 

a 


.052 

.08849 


f)73 


-.00^ 


n 


227 


.146 


B  =  .558 

J074 

L 

0  =  .00633 

042  j 
.010  til 

8 

a 


.786 

.01080 


.275 
.150 
.054 


JiZO 


JOOO 


..—L 


.200  .100 

11)  10%  winsorized  100 

mean  .000  |  [  |  | 


8 

a 


-.149 

.02451 


.180  .094 
\  .124  .081 

III  1 


Table  1  continued’ 
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These  correspond  to  an  additional  sum  of  squares  of  .00025  whi 
an  additional  root  mean  square  error  for  an  ordinate  of  .005. 


moan,  the  contours  are  rather  elongated  in  the  direction  of  upper 
left  to  lower  right,  suggesting  some  trade-off  betv/cen  a  and  3. 

This  is  especially  true  v;hen  trirxiiing  is  involved.  Therefore,  it 
might  be  possible  to  fix  3  at  zero  and  find  an  a  such  that  the 
distribution  corresponding  to  (0,a)  v/ould  approximately  produce 
the  trirmed  mean  as  the  posterior  mean.  In  other  v/ords,  for  each 
trirmied  mean  we  may  be  able  to  find  a  contaminated  normal  distri¬ 
bution  for  which  the  use  of  such  trimmed  mean  is  appropriate. 

Restricting  3=0  and  repeating  the  procedure  used,  the 
contaminated  normal  distribution  associated  with  each  trimmed  mean 
and  smoothly  trimmed  mean  were  found  and  similar  to  Figure  2  and 
Table  1  ,  we  now  have  Figure  4  and  Table  2  Table  2  compares 
the  trimmed  means  and  the  estimators  produced  by  the  contaminated 
normal  distributions.  The  closeness  of  the  two  sets  of  weights 
shows  that  estimators  from  contaminated  normal  distributions  can 
closely  approximate  trimmed  means.  Figure  4  gives  the 
position  each  trirmed  mean  takes  on  the  a-axis.  Figure  5  com¬ 
bines  the  results  of  Figure  2  and  Figure  4  with  all  the  trimmed 
means  restricted  to  the  a-axis,  and  with  the  other  estimators  allowed 
to  take  any  position  in  the  space. 

7.  Interim  Summary 

For  each  robust  L-estimator  considered,  a  distribution 
model  has  been  found  for  which  the  Bayesian  mean  approximates  the 
estimator.  A  number  of  conclusions  may  be  drawn»  as  follows. 
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a 


Figure 


25%  trirmed  mean 


20%  trinmed  mean 

5 

I 

5 

S 

i 

'i 

! 

20%  smoothly  triimed  mean 

10%  trimmed  mean 

l%-5%  triimied  mean 

4  Values  of  a  associated  with  various 
L-estimators,  6  is  restricted  to  be  zero. 
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L-Estirnator 


L-estimators 

.102  .102 

J091  .102  .102 

1)  1%  trimed  i  I  I  I  I 

mean  i  1  i  1  I 

.104  .104 

2)  1%  triinned  ’1°^ 


Weights  for  Value  of  a  for  Weights  of 

OLUfW 

L-estimators  estimator  a-estimator 


a  =  .000893 


a  =  .00179 


a-estimator 

.105  .101 
J093  .101  .101 


.110  .101 
.086  .102  .101 


.105  .105 

3)  2.5%  trimmed  J)78  .105  .105 

•"ean  J,|  I  M 
.111  .111 

4)  S%  trimmed  .056  >111  *111 


a  =  .00524 


a  =  .00584 


.112  .102 
.082  .  .102  .102 


.125  .104 
.062  .105  .104 


.126  .126 
JDOO  .126  .126 


jOOO  .167  .167 


5)  10%  trimmed 


6)  20%  trimmed 
mean 


.000 

7)  25%  trimmed  jOOO  .100 

mean  III 

.143  .143 
.071  .143 

8)  20%  smoothly  DOO  I  I  I 

trimmed  mean  — J — I — I — L 


a  =  .04380 


a  =  .17454 


o  =  .26307 


o  «  .09871 


.128  .121 

-.002  .134  .119 


.024  .177 
-.017  .147  ,169 


.198 

.110,  ,.210 

-JOOO. 


-.014 


.157  .138 

.074  .  .145 


-.015 


Table  2  Left  side:  first  five  weights  of  the  L-estinators 
Right  side:  8  restricted  to  be  zero,  first  five 
weights  of  OLUMV  estimator  for  :ne  contaminated 
normal  distribution  (with  parar.ster  ct) 
associated  with  the  estimator  oa  the  left. 
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sssraBS 


.28 


.26' 

;  25%  tririmed  nean 

.24 

.22 

.20 

.18 

20%  trinned  nean 

.16 

.14 

' 

.12 

.. 

.10 

^20%  smoothly  trinned  mean 

1 

(•1 

CO 

o 

. 

10%  trinned 
•  linearly  weighted 

.06 

mean 

10«  winsorized 

10%  trinned  mean 

rnc^n 

.02 

linearly  squared 

l%-5%  trinned  weighted  weighted 

mean  .mean  •mean 

cubic 

2ightGC 

nean 


.3 


.2  -.1 

Figure 


.1 


.2 


.3  .4 


.5 


.6  .7 


.8' 


r>  Cot'.biiiing  results  of  Figure  2.2  and 
Figure  ^.4. 


The  study  has  been  effective  in  bringing  into  the  open 
implied  model  assun.ptions  inherent  in  the  estimator.  Such 
implied  assumptions  can  thus  be  studied,  criticized  and,  where 
appropriate  applied  to  other  problems.  For  example  if  trimmed 
means  perform ^v, 'ell  for  single  samples,  this  implies  the  appro¬ 
priateness  of  the  contaminated  normal  distribution.  This  model 
may  therefore  with  equal  logic  be  used  for  more  general  models 
which  traditionally  employ  the  normal  assumption. 

It  becomes  clear  that  the  Winsorized  mean  is  appropriate  for 
a  contaminated  distribution  which  is  slightly  light-tailed. 
Therefore  unless  one  had  confidence  that  such  were  the  case, 
the  Winsorized  mean  would  not  be  appropriate  since  it  puts 
heavy  weight  on  the  next  to  most  extreme  observations. 
Linearly-weighted,  squared-weighted  or  cubic  weighted  means 
imply  that  distributions  met  in  practice  are  heavy-tailed 
with  perhaps  a  small  proportion  of  contamination. 

Trimmed  means,  however,  are  appropriate  if  with  probability 
1-a  observations  are  from  a  fixed  normal  distribution  and 
with  probability  a  observations  are  from  a  normal  distri¬ 
bution  with  the  same  mean  but  a  larger  variance.  The  trim¬ 
ming  proportion  depends  on  a.  . 


8.  _ Inplicd  AsGuPipt ions  of  the  M-estimator 

Although  some  of  the  L-estimators  have  proved  to  be  useful 
in  the  location  estimation  problem,  their  developers  have  found  it 
hard  to  generalize  them  directly  to  cover, for  example,  regression 
models,  general  linear  models  and  non-linear  models.  M-estimators 
are  more  flexible  in  this  respect.  To  find  the  assumptions  that 
would  render  appropriate  each  proposed  M-estimator,  the  most 
natural  way  would  be  to  try  to  find  a  distribution  f  which  v/ould 
make  the  H-estimator  a  maximum  likelihood  estimator.  Let  i|»(x) 
be  associated  with  some  M-estimator;  then  the  distribution  f(x) 
for  which  M  will  be  maximum  likelihood  must  be  such  that 

4,(x)  =  ■.  021^ 

£og  f(x)  =  -  fip(x)clK  +  c 
f(x)  «  . 

If  /  we  can  properly  standardize  f  and  this 

-oo  ^ 

would  then  be  the  precise  assumption  which  would  make  the  M- 
estimator  a  maximum  likelihood  estimator.  As  we  will  see,  however, 
this  integral  condition  is  not  satisfied  by  all  proposed  ^ 
functions. 
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Tf)c  f* s  associated  v/ith  different  0*s 


(‘) 


lij(x)  =  X  =>  f(x)  =  e  ^ 
which  is  the  normal  distribution. 


(ii)  Huber's 


:  X  <  -k 

’l'(x,k)  =  ^x  -k<x<k 

k  k  <  X 


:] _ e  2 

.n1/2  ^ 


f(x) 


c.|  -klx|+ 


>1/2 


is  a  constant  such  that  /  f(x)dx 


(iii)  Andrews' 

( 

’  sin(x/c) 

1x1  £  ctr 

'l'(x,c)  =  i 

1 

0 

o.w. 

1 

r  cos(x/c) 

f(x)  = 

1 

..-c 

a  Is  some  constant. 


for  lx]  <  k 
for  lx|  ^  k 

=  1. 


|xl  £  cn 
otherwise 
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f 


Since  /  f(x)dx  =  <«  ,  f(x)  is  not  a  proper  density. 

•  oo 

A  plot  of  these  f's  is  given  in  Figure  6.  For  Andrev/s',  Hampel's 
and  Tukey's,  the  tails  are  constant  which  make  them  improper.  But 
notice  that  their  constant  parts  have  very  small  values  and  one  pos¬ 
sible  v/ay  to  make  the  densities  proper  is  to  truncate  the  functions 
for  very  large  deviations. 

To  allow  comparison  with  our  previous  results,  we  attempt 
to  obtain  contaminated  exponential  power  distributions  which  yield 
maximum  likelihood  estimators  that  are  as  close  as  possible  to  the 

f  I 

M-estimators.  One  way  to  proceed  is  to  look  for  a  ^  within 

the  contaminated  exponential  power  family  which  approximates  the 
function  of  the  estimator  of  interest.  To  obtain  better  approximation, 
we  shall  in  addition  allow  k  to  vary. 

The  density  for  the  contaminated  exponential  power  distribu¬ 
tion  is 


f(y)  =  (l-a)P(yl0,a,B)  +  aP(yle,ko,B) 


where  P(yl0,a,B)  =  w(3)a‘^exp 


-c(B) 


a 


2/(1+311 


-00  <  y  < 


c(B)  =  -f 

r^r  I 


r[|(i+3)] 


r[^(i+3)] 


and  w(3) 


{r[|(i+3)]}^/^ 

(HP){r[~(l+3)]}^/^ 


Without  loss  of  generality,  set  0=0,  0=1 


-10  -9-8  -7  -6  -5-4  -3-2  -1  0  1  2  3  4  5  6  ,7  8  9  10 
(li)  Andrews'  c  =  1.42 


10-9-8  -7-6  -5-4  -3-2  -1  01  23456789  10 

(lii)  Hampel's 

a  »  1.42,  b  =  2.70,  c  =  5.54 


-10-9-8  -7-6  -5-4  -3-2  -1  01  23456  789  10 


(1v)  Tukey's  c  =  5.4 


Figure  6  Function  f  corresponds  to  each 
M-cstinator. 


“  2* 

2~ 

f(x)  =  (l-a)w(3)exp 

-c(e)|x|'*® 

+aw(B)/k  exp 

-c(e)|f| 

1:1 

-f'(x)  =  (l-a)w(e)c(6)exp|-c(B)!x|^'^^]  l^\^\ 


+aw(B)/k*c(3)/k  exp 


-c(B)I^I 


2~ 

1+3 


— 1- 
1+B'k 


2 

1+3 


for  X  >  0 


4'{x)  = 


/  M  .  2 

(l-a)c(B)|^Jglx|  jexp|-c(B)|x|  ^ 

2 

.X 

-c(3)l^l 

(l-a)exp  -c(B)  1x1^'*’^ 

+  a/k  exp 

X  >  0 


and  the  ip  function  is  antisymmetric  about  x  =  0. 


This  ip  function  (only  the  positive  half,  the  other  half 
is  obtained  by  antisymmetry)  is  plotted  in  Figure  7  for  different 
combinations  of  B.  a  and  k.  Observe  that  as  3  becomes 
larger,  the  curve  increases  faster  in  the  neighborhood  of  zero  and 
then  slows  down  to  reach  its  maximum,  as  a  increases,  the 
curve  drops  sharper  after  reaching  its  maximum  and  as  k  increases 
the  \|»  function  gets  closer  to  zero  and  stays  smaller  for  a  longer 
period.  Except  for  B  =  1,  all  the  functions  for  contaminated 
exponential  power  distributions  eventually  increase  to  infinity. 
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Even  so  the  ordinate  can  remain  sn’^ll  v/ithin  a  very  vnde  range, 
say  between  ±6  and  ±10  and,  therefore,  for  realistic  data  sets 
will  produce  similar  results  as  those  ^  functions  which  are 
associated  witii  those  proposed  M-estiniators .  Tukey's  and  Andrews' 
estimators  have  functions  with  shape  somewhat  similar  to  those 
in  Figure  7.  Comparison  shows  that  Tukey's  estimator  with 
c  =  4.05  is  similar  to  the  maximum  likelihood  estimator  for  a  con¬ 
taminated  normal  distribution  with  3  =  .1,  a  =  .1  and  k  =  10.0. 

As  mentioned  before,  to  obtain  a  better  approximation  in  the  tail 
of  the  ip  function  in  this  study  of  M-estimators,  the  value  of  k  will 
no  longer  be  fixed  at  three  as  in  the  study  of  L-estimators. 


An  Alternative  Approach 

A  different  interpretation  that  can  be  attached  to  the 
M-estimators  is  from  the  point  of  view  of  the  weighting  function. 

Write  w(x)  =  ,  equation  -)  =  0  is  equivalent  to 


x..-e 


x.-e  X.-0 
2(-V-)w(^)  =  0  and  6  = 


£XiW(— ^— )  (Beaton  & 


X.-0  Tukey  1974) 


w(x)  can,  therefore,  be  viewed  as  the  weight  of  the  estimator 
given  to  the  observation. 

With  this  approach,  each  M-estimator  is  then  being 
cha  racterized  by  the  w-function.  The  following  are  some  weighting 
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f 


functioriG  for  the  M-estiir.otors ,  the  plots  of  v/hich  are  shov/n  in  Figure 
9.  In  terns  of  v/eight  functions,  v/(x)  =  4>(x)/x 

(i)  normal  w(x)  =  1 
(ii)  Huber's 

R  *  <  -k 

w(x,k)  ■  \  ^  -k  <  X  <  k 

a 

(iii)  And  rev/s' 


w(x,c) 


[ 


sin(x/c)/x 

0 


1x1  £  CTT 

o.w. 


(iv)  Hampel's 


w(x,a,b,c)  =  sgn  x 


0  <  1x1  <  a 
a  <  |x|  <  b 

b  £  1x1  <  c 

|xl  >  c 


(v)  Tukey's  biv/eight 


w(x,c) 


V  2  2 

(l-(f)  ) 

0 


1x1  1  c 


o.w. 


Corresponding  to  each  contaminated  exponential  power  distribution, 
there  is  also  a  w-function  w(x)  =  ,  and  this  again  can  be 

used  in  picking  out  the  distribution  that  would  lead  to  the  robust 
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(i)  Normal 


w(x) 

-10  -8-6  -4-2  0  2  4  6  8  10 


(iv)  Hampel 's  a  =  1 


(ii)  Huber's  k  =  2  (v)  Tukey's  c  =  5J 


(lii)  Andrews'  c  *  1.42  (vi)  Tukey's  c  =  4. 


Figure  9  w- functions  for  various  M-estinators. 


M-estimator  proposed.  The  w-functions  are  plotted  (Figure  10)  for 
contaminated  exponential  power  distribtuions  with  a  =  .025,  .05, 
.075,  .1  and  B  =  0,  .2,  .4,  and  also  k  =  1.0,  2.0,  4.0,  6.0,  8.0, 

10.0.  From  the  formula  for  the  v/-function,  it  is  easy  to  see  that 
for  B  >  0,  w(x)  is  always  infinity  at  x  =  0.  Although  this  can 
cause  trouble  when  actually  solving  for  the  estimate  for  some  data 
set,  comparisons  are  still  possible  with  the  weighting  function  of 
the  M-estimators. 

Comparing  Figure  9  with  Figure  10,  v/e  can  make  the 
following  matchings. 


1. 

Huber's  estimator  - 

B  =  0  a  =  .05  k  = 

2.0 

2. 

Hampel's  estimator  - 

B  =  0  a  =  .10  k  = 

8.0 

* 

or  maybe  B  =  0 

a  = 

.05  k  =  4.0 

3. 

Andrev/s'  estimator  - 

B  =  0  a  =  .10  k  = 

4.0 

4. 

Tukey's  estimator 

(c  =  4.05)  “ 

B  =  .2  a  =  .10  k  = 

or  maybe  B  =  0 

10.0 

a  = 

.10  k  =  4.0 

5. 

Tukey's  estimator 

(c  =  5.40)  ■ 

B  =  .2  a  =  .025  k  = 

10.0 

The  correspondence  is  far  from  exact,  the  w-functions  for  these 
estimators  simply  cannot  be  reproduced  very  closely  by  our  family  of 
distributions.  However,  we  do  know  that  estimators  of  this  kind  are  not 
very  sensitive  to  minor  discrepancies  in  the  shape  of  the  functions, 
in  that  similarly  shaped  weight  functions  will  produce  very  close 
estimates. 


-8  -6-4-2  0  2  4  6  8  -8-6  -4  -2  0  2  4  6  8  -8  -6  -4-2  0  2  4  6  8 


-8  -6  -4-2  0  2  4  6  8  -8  -6-4  -2  0  2  4  6  8  -8-6  -4  -2  0  2  4  6  8 


-8  -6-4  -2  0  2  4  6  8  -8-6  -4  -2  0  2  4  6  8  -8-6-4  -2  0  2  4  6  8 


-8-6  -4-2  0  2  4  6  8  -8-6  -4-2  0  2  4  6  8  -8-6  -4  -2  0  2  4  6  8 

$•0  8«.2  8*. 4 

Figure  10  continued 
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0=1 
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Figure  10  continued 
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/■ 
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f'  •-  0  (5  «  .2 

10  continiiod 
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The  corr.parison  docs  provide  an  approximate  idea  of  the 
nature  of  these  estimators  in  terms  of  the  3-a  space.  The 
(3-coordinates  are  close  to  zero  strongly  suggesting  that  estimators 
v/ith  proportions  similar  to  the  M-estimators  studied  above  will  be 
produced  by  the  contaminated  normal  model. 

9,  Summary  of  the  present  chapter 

Models  for  the  (B»a)  family  of  distributions  have  been 
found  which  generated  estimators  similar  to  published  L-estimators 
and  H-estimators.  The  contaminated  normal  model  can  produce  esti¬ 
mates  similar  to  trimmed  means  and  to  the  H-estimates  of  Huber, 
Tukey,  Hampel  and  Andrews.  Among  the  L-estimators,  linearly- 
weighted,  squared-weighted  and  cubic-weighted  means  implied  the 
need  for  the  assumption  that  the  parent  distribution  had  heavy  tails 
in  addition  to  some  contamination.  The  Winsorized  mean  implies 
that  the  parent  distribution  was  somewhat  light- tailed  with 
contamination. 
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Appendix 


A 


Proof  that  "the  limiting  v, 'eights  obtained  in  Section  2.4 
as  m  are  the  same  as  weights  for  OLUHV  estimators." 

Let  us  assume  a  random  sample  of  size  n  is  taken  from  a 
distribution  P(Yl0)  and  the  ordered  observations  are  Y  =  (y^ , 

Also  assume  P(Yl0)  is  symmetric  about  0,  so  that  a 

linear  combination  of  order  statistics  will  be  unbiased  if  and 
only  if  the  weights  are  symmetric  about  its  center. 

Let  CL  be  the  set  of  all  symmetric  weights,  ^  be  the 
sample  space,  a^y^+...+a^^  is  the  OLUMV  estimator,  then  we  have 

/  (a.|y^+...+a^j^-0)*P(Yl0)dY 

V- 

»  min  /  (A,y,+...+A  V  -e)*P(Y|e)dY 

Wi . ”  ""  '  ,) 

What  we  have  done  in  Section  2.4  is  to  take  m  sets  of  random 
samples  (y.j^,...,y.jj^)  i  =  l,...,m  and  find  weights  that  mini- 
m 

mizc  O'**  ^'l^^^'^alently, 

minimize 

1  m 

as  n,*.. 

This  is  the  same  as  finding  weights  that  minimize 
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i 


/  (A^y^+...+A^y^-MY)^P(YlO)dY 
-  /  {A,y^+...+A^y„-0+e-MY)'p(Y|e)dY 

V 

=  /  (A,y,+...+A  V  -6)^P(Yl0)dY  +  /  (0-My)'P(Y l0)dY 

+  2  /  (A^yi+...+A^y„-6)(0-MY)P(Y|0)dY  . 

y  (A. 2. 

Let  T  =  *’  second  and  third  terms  of 

(A. 2  )  become 

/  (0-MY)*P(Yl0)dY  +  2  /  (T-0)(0-MY)P(Yl9)dY 

*  /  (0*-2eMY+Mj+2T0-20^-2TMY+29MY)P(Yl0)dY 

=  /  (M5-0  *+2T0-2TMY)P(Yl0)dY  . 

^  -  ~ 

Since  My-0*  does  not  depend  on  A^,...,A^,  we  v/ant  to  find 
weights  that  minimize 

/  T-(0-MY)P(Yle)dY  . 

V  ”  V 

If  v/e  now  put  a  prior  on  0,  P(0)  =  ^-  c<0<CiWe  are 

looking  for  weights  that  minimize 

I  (A,y,+...+A^„-0)*P(Yl0)^  d9 

-  c  y 

♦  /^  /  T(0-My)P(Y1o)^  dY  d0  . 

i  ~  (A.  3) 
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Nov/  if  c  is  large  cnougfi,  after  proper  standardization, 

P(YlO)  •  2~  will  be  very  close  to  the  posterior  distribution  v/ith 

noninformative  prior  P(6)  ®  constant.  Changing  the  order  of 
integration  in  the  second  term  of  (A.  3),  we  have 

c  , 

/  (0-My)P(Y|9)^  do  -V  0  as  c  -  » 

SO 

C  1 

/  /  T(6-MY)P(Yle)^  do  dY  0  as  c 
^  -c  -  “ 

Thus  the  second  term  of  (A. 2. 3)  drops  out  as  c  ®. 

The  1  imiting  v/eights  will  minimize 

l"  /  (A^yi+.'..+A^yn-e)'P(Y|e)^dY  de 

-c 

when  c  is  large  enough.  Since  /  (A.|y^  +  .  ..+A^y^-0)^P(Y|0)dY 
is  independent  of  0,  this  is  the  same  as  minimizing 
/  (A^yi+. . .+A|^^-0)^P(Y |6)dY  which  leads  to  the  same  result  as 

(A.  1).  Therefore  . are  our  limiting  weights. 
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